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Graphical models are widely used to study biological networks. 
Interventions on network nodes are an important feature of many 
experimental designs for the study of biological networks. In this pa¬ 
per we put forward a causal variant of dynamic Bayesian networks 
(DBNs) for the purpose of modeling time-course data with interven¬ 
tions. The models inherit the simplicity and computational efficiency 
of DBNs but allow interventional data to be integrated into network 
inference. We show empirical results, on both simulated and experi¬ 
mental data, that demonstrate the need to appropriately handle in¬ 
terventions when interventions form part of the design. 

1. Introduction. Network inference approaches are widely used to study 
biological networks, including gene regulatory and signaling networks. Since 
processes underlying such networks are dynamical in nature, time-course 
data can help to elucidate regulatory interplay. Network inference methods 
for time-course data have been investigated in the literature, with contribu¬ 
tions including (among many others) Husmeier (2003), Bansal, Gatta and 
di Bernardo (2006), Hill et al. (2012). Scalable assays spanning multiple 
molecular variables continue to advance and network inference applied to 
such data offers the potential to provide biological insights over many vari¬ 
ables at once. Inferred networks can be used to generate testable hypotheses 
that are context specific in the sense of reflecting regulatory events in the 
specific cells under study [Maher (2012), Hill et al. (2012)]. In disease biology, 
such context-specific networks can be used to shed light on disease-specific 
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processes and thereby inform drug targeting and personalized medicine ap¬ 
proaches [Ideker and Krogan (2012), Akbani et al. (2014)]. 

Interventions, for example, gene knockouts, RNA-interference (RNAi), 
gene editing or inhibition of kinases, play an important role in experimental 
designs aimed at elucidating network structure. This is due to the fact that 
association does not imply causation: interventions can reveal whether a 
given node has a causal influence on another as opposed to merely being co¬ 
expressed. As data acquisition costs fall, interventional time-course designs 
are becoming more common. It is important to note that in interventional 
designs the number of interventions is often much smaller than the number 
of molecular variables (leave alone the number of possible interventions); this 
may be due to lack of suitable experimental interventions or cost or both. 
This means that causal edges cannot simply be directly identified from cor¬ 
responding interventional experiments; however, causal inference may still 
be possible using a small subset of all possible interventions [Hyttinen, Eber- 
hardt and Hoyer (2013)]. 

In this paper, we put forward an approach to network inference from time- 
course data with interventions. To fix ideas, we briefly introduce a data set 
that we study (and describe in detail) below and that motivated the work 
described here. The data comprise time-course assays of p = 48 signaling 
proteins in human cancer cell lines. Experiments were carried out under 
four conditions: no interventions; intervention on the AKT protein nodes; 
intervention on the EGER nodes; intervention on both AKT and EGER 
(all interventions were carried out using drugs that inhibit the enzymatic 
activity of the target, as we describe in detail below). Intuitively, the inter¬ 
ventional data are valuable because they give information not only on the 
causal influences of the target nodes (AKT and EGER), but also on the 
wider graph structure, since causal descendants of the target nodes are ex¬ 
pected to change under intervention. On the other hand, since the number 
of interventions carried out is small, a causal graph cannot be estimated 
by modeling of interventions alone. Rather, a network inference approach 
is needed that can model the time-course data itself as well as the changes 
seen under intervention and that is the goal of the present paper. 

From the perspective of causal inference [Pearl (2000, 2009)], interven¬ 
tional data require special treatment because the intervention modifies the 
causal graph and thereby the likelihood. We proceed within a graphical mod¬ 
els framework, combining ideas from Dynamic Bayesian Networks (DBNs) 
and Causal Bayesian Networks [CBNs, see Definition 1.3.1 in Pearl (2000)]. 
We focus on continuous data, as obtained in conventional biological time- 
course experiments. Interventions are accommodated by modifying the sta¬ 
tistical formulation for those experimental samples in which interventions 
were carried out, following ideas discussed in Eaton and Murphy (2007), 
Pearl and Bareinboim (2014) and Pearl (2000). Specihcally, for experiments 
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in which interventions are carried out, we modify the structure of the di¬ 
rected acyclic graph (DAG) that underlies the DBN and explore various 
parameterizations of the effect of the intervention. Our modeling of inter¬ 
ventions constitutes a pragmatic extension of DBNs to include causal opera¬ 
tions that allow analysis of mainstream experimental data. The approaches 
we propose can be described in terms of CBNs and the “do” operator of 
Pearl (2000) applied to the DAG underlying a DBN, as we discuss further 
below. We therefore refer to them as “Causal Dynamic Bayesian Networks” 
(CDBNs). However, DBNs themselves are not causal models and full formal 
justification of the approaches we propose requires additional assumptions, 
including assumptions on the extent and form of the effect of interventions 
and, for observational or partially interventional data, on the absence of hid¬ 
den common causes. Full discussion of causal semantics is beyond the scope 
of this paper, but we refer the interested reader to Pearl (2000) for further 
discussion. 

The remainder of the paper is structured as follows: First, a Bayesian 
framework for network inference using DBNs is outlined in Sections 2.1.1 
to 2.1.4. Next, the interventional models that constitute our main focus 
are described in Section 2.2. We illustrate some key points of the approach 
using examples in Section 2.3. Empirical results appear in Section 3. We 
apply the methods on both simulated and experimental protein signaling 
data, exploring the behavior of a number of approaches by which to model 
interventions, and comparing their performance with respect to network 
reconstruction. We find that in the context of interventional data, analyses 
that do not account for interventions do not perform well. We close with 
discussion of open questions and future prospects. 

An R package “interventionalDBN” for network inference using inter¬ 
ventional data is available on GRAN and on the author’s website: www2. 
Warwick.ac.uk/fac/sci/statistics/staff/academicresearch/spencer 

2. Methods. We fix ideas and notation by first reviewing the “classical” 
DBN formulation (without interventions). We then go on to discuss in detail 
how the likelihood can be modified to account for interventions. Taken to¬ 
gether, this gives an overall approach by which to perform structural network 
inference from time-course data that includes interventions acting upon a 
subset of the nodes. 

2.1. Dynamic Bayesian network model. A DBN uses a graph to describe 
probabilistic relationships between variables through time, with associated 
parameters specifying the temporal relationships. Following Friedman, Mur¬ 
phy and Russell (1998), Murphy (2002), Husmeier (2003), we consider DBNs 
with edges forward in time only (i.e., first-order Markov with no within-time- 
slice-edges) and assume stationarity in the sense that neither network topol¬ 
ogy nor parameters change through time (in what follows, we use “DBN” 
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to refer to this specific class of DBN). Then, each variable at time t de¬ 
pends only on its regulators at the previous time step. Further, since the 
graph structure does not change with time and edges are always forward in 
time, the topology can be described by a graph G with exactly one vertex 
for each protein under study and edges understood to mean that the child 
depends on the state of the parent at the previous time-step. Note that the 
DBN model is in fact a DAG; the graph G introduced above can be used 
to construct a DAG with one vertex for each variable at each time point; 
this is known as the “unrolled” graph [see, e.g.. Hill et al. (2012) for further 
details]. Operations on DBNs can be described in terms of this underlying 
DAG, but the p-vertex graph G offers a convenient summary. 


2.1.1. Statistical formulation. Let Xj^c,t denote log-expression of variable 
j G {1,... ,p}, at time t G {0,..., T — 1} in the time course obtained under 
experimental conditions c G C. We use X = {xj^c,t} to denote the complete 
data set. The edge set of the graph G is E{G). Let = {i: {i,j) € E{G)} 
denote the set of parents for node j. Then, for conditions c without inter¬ 
vention, the DBN model we consider (for node j) is 


( 1 ) 




< zG70') 

_L c- 

\,Oi2 + 


t > 0, 


t = 0, 


(i) 

where /31 ^ denotes parameters that govern the dependence on parent nodes 
in the previous time step, af , 0.2 are intercept parameters that do not 
depend on the parent set and ej^c,t ~ E[{0,a‘j) is a noise term. The 
use of two intercept parameters, one for the initial time point, allows the 
model more flexibility to incorporate the effects of the parents acting on 
the first observation. Modeling the initial observation also provides extra 
degrees of freedom, unless the experimental design has only one unreplicated 
experimental condition. 


2.1.2. Variable selection. Under the stationarity and Markov assump¬ 
tions above, there is a close relationship between inference concerning the 
DBN graph G and variable selection for the above regression formulation. 
As discussed in detail in Hill et al. (2012), exploiting this connection al¬ 
lows efficient inference regarding the graph G. Specifically, if P(z G 
is the posterior probability that variable i appears in the regression model 
for variable j above (i.e., the posterior inclusion probability in the variable 
selection sense) and assuming a modular graph prior P(G) (i.e., a prior that 
can be factorized over nodes), we have 

P((z,j)G.E(G)|X)=P(zG7(''^|X) 
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= /(iG7)P(7^'^=7|X), 

■yeM 

where A4 denotes the set of all possible variable subsets and I is the indicator 
function (for simplicity we assume that M does not depend on j, but it could 
do so). 

Thus, due to the structure of the DBN model, for estimation of posterior 
probabilities of edges in graph G, it suffices to perform variable selection 
for each node in turn, with variables at the previous time point considered 
as potential predictors. To ease notational burden, we leave dependence on 
node j implicit in the following sections. Let x = {xj^c,t} denote all observa¬ 
tions for protein let n (= T x |C|) be the total number of such observations. 
Then, model (1) can be written as 

(3) X = XqO; “F X^/3 T £, 

where X.y denotes the matrix formed by selection of columns of X corre¬ 
sponding to indices 7 , e ~ iV„(On, T^In), Nn denotes the n-dimensional mul¬ 
tivariate Normal distribution, is the n-dimensional vector of zeros and 
Iji is the n X n identity matrix. The design matrix is split into two parts: 
Xq, which is the same for every model and has parameter vector oc; and 
X..^, which depends on the choice of parents given by 7 and has parameter 
vector p. Let a be the length of o: and b be the length of /3, then Xq has 
dimension nx a and X..y has dimension nxb. Following equation (1), we see 
that here a = 2 and Xq = [lp>o}l{t=o}]nx 2 - In the absence of interventions, 
observations of the parent proteins from the previous time point form the 
columns of X^ (we discuss interventions below). For the first observation, 
where there are no previous observations, zeros are inserted into X..^ in the 
place of the parent observations. 

We can assume without loss of generality that the two parts of the design 
matrix (Xq and X.y) are orthogonal, that is, Xj^X^ = 0^x6• This reparame¬ 
terization ensures the predictors have mean zero; for details see supplemen¬ 
tary material [Spencer, Hill and Mukherjee (2015)]. 

2.1.3. Marginal likelihood. The marginal likelihood p(x| 7 ) for node j is 
obtained by marginalizing over all model parameters, that is, 

(4) P(x|7) = J p(x|0,7)p(0|7) de, 

where 6 = (ct, /3, cr) is the full set of model parameters. We make use of widely 
used parameter priors from the Bayesian literature [Denison et al. (2002)]. 
First, we use improper priors for cx and cr, namely, that p{cXja\'^) oc - for 
cr > 0. Note that as this prior is improper, for meaningful comparisons to 
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be made between models in Ai, this prior must be the same for all of the 
models. Second, we use Zellner’s g'-prior for the regression coefficients so 
that / 3 |(q!,c 7 , 7 ) ~ ga'^. Following Smith and Kohn (1996), 

Kohn, Smith and Chan (2001), we set g = n. With this prior the covariance 
matrix for /3 is proportional to (X!^X..y)“^, which has some nice proper¬ 
ties, for example, invariance to rescaling of the columns of X.y [Smith and 
Kohn (1996)]. Using standard results [Denison et al. (2002)], the marginal 
likelihood is then given in closed form as 

liT / / \ \ 

( 5 ) = 

where Pq = Xo(X^Xo)~^X^, = X.y(X!^X^)“^X!^ and the normalizing 

constant X = ir(^) 7 r-(”-“)/ 2 |x;fXo|-^ 

We also wish to consider the model 7 = 0 (in which 6 = 0). The regression 
equation is simply x = XoQ: + £ and the marginal likelihood is given by 

p(x |7 = 0 ) = K'(x^(I„ — Po)x)“("““)/^. 

2.1.4. Model prior. Following Scott and Berger (2010), we include a mul¬ 
tiplicity correction to properly weight models in light of the number of pos¬ 
sible parent sets. Since there are (|) possible models for node j with k 
parents, the prior probability is chosen so that for absent prior information 
on specific edges we have P( 7 *^'^) = 7 ) oc (|^|) 

We may also wish to include existing biological knowledge in the model 
prior, which we do by specifying a prior network Gq, following Werhli and 
Husmeier (2007), Mukherjee and Speed (2008), Hill et al. (2012). Such a 
prior network is based on causal biochemistry and should be regarded as a 
prior on causal structures. A penalty is applied to each candidate graph G 
based on the number of edge differences with the prior graph Gq. That is, 

( 6 ) P( 70 ) = 7 ( 7 ^) oc exp(-A(| 7 \ 7 j^^| + | 7 j^^\ 7 |)), 

where 'Jq ' is the parent set of node j in the prior graph Gq and A is a scalar 
hyperparameter that controls the strength of the prior. Detailed discussion 
of informative priors for networks is beyond the scope of this paper; we refer 
interested readers to the references above for further discussion. The prior 
graph used for results reported in Section 3.2 is shown in Spencer, Hill and 
Mukherjee (2015). The prior strength parameter was chosen subjectively to 
be A = 4. 

2.1.5. Computation. Combining the marginal likelihood (5) and model 
prior P( 7 ('^)) gives the posterior P( 7 ('^)|x) ocp(x| 7 ('^))P( 7 ('^)) over parent sets 
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(so far, without interventions). Posterior probabilities for individual edges 
in the graph are obtained directly from the posterior over parent sets by 
(2). As discussed in detail in Hill et al. (2012), placing a bound m on graph 
in-degree, following common practice in structural inference for graphical 
models [e.g., Husmeier (2003)], allows exact computation of the posterior 
scores. 

2.2. Modeling interventions. In statistical terms, interventions may al¬ 
ter the edge structure of the graphical model, or model parameters, or both. 
Here, we discuss the modeling of interventional data as a causal extension of 
the DBN model outlined above, which we call a Causal Dynamic Bayesian 
Network (CDBN). For experimental conditions c that involve an interven¬ 
tion, Section 2.2.1 below outlines different approaches by which to form the 
likelihood pc{x\G,9c) for an interventional experiment c. We first give a 
general typology of interventions following Eaton and Murphy (2007), with 
extensions to accommodate the wide range of interventions seen in biological 
experiments, and then go on to discuss kinase inhibition in more detail. 

It is important to note that throughout we assume that the nodes targeted 
by the interventions are known and so the intervention has no additional un¬ 
modeled effects elsewhere in the network, an assumption which is integral 
to the definition of a CBN [Pearl (2000)]. We also assume that interventions 
are in effect during the entire period of the experiment (e.g., a gene that 
is knocked out remains knocked out throughout). This is a reasonable as¬ 
sumption for mainstream interventional designs, but for some interventions 
that are mediated by reversible biochemistry this may require that the time 
course is of appropriate total length. Although we do not pursue this direc¬ 
tion in this paper, we note that since the approaches described here provide 
a likelihood that incorporates interventions, they could in principle be used 
to estimate the targets of interventions. 

2.2.1. Approaches for modeling interventions. In a perfect intervention 
certain edges that the target node participates in are removed. We call an 
intervention that corresponds to removal of edges leading out of the tar¬ 
get node a perfect-out intervention and one that corresponds to removal of 
edges leading into the target node a perfect-in intervention. For example, a 
knockout with known target gene j can be thought of as externally setting 
the transcription level of node j to zero. This removes the causal influ¬ 
ence of other nodes on j and therefore constitutes a perfect-in intervention. 
However, since the change to j may have causal influences on other nodes, 
outgoing links are allowed to remain. 

When applied to a DBN, such an intervention corresponds to a compound 
“do” [Pearl (2000)] that operates on multiple nodes in the underlying un¬ 
rolled DAG. For example, the knockout of gene j mentioned above would 
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correspond to do(Xj^o = 0 ,... = 0), where Xj^t is the vertex (and 

associated random variable) in the unrolled graph corresponding to gene j 
at time t. 

In a mechanism change intervention the structure of the graph remains 
unchanged, but parameters associated with edges that the target partic¬ 
ipates in are allowed to change. In a mechanism-change-out intervention, 
parameters are re-estimated for the case where the target is a parent; in 
a mechanism-change-in intervention parameters are re-estimated when the 
target is the child. 

In a fixed-effect intervention, the effect of the inhibitor is modeled by an 
additional, additive parameter in the regression equation. In a fixed-effect- 
in intervention the effect appears in the equation for the target itself, while 
in a fixed-effect-out intervention the effect appears in the equations for the 
children of the target. These formulations can be useful in settings where 
the intervention results in a change in the average level of the target or its 
causal descendants. All of the intervention models can be described within 
the framework of CBNs using the “do” operator of Pearl [Pearl (2000), Pearl 
and Bareinboim (2014)]; for more details see Spencer, Hill and Mukherjee 
(2015). 

In our empirical results, we focus on a specific type of intervention, namely, 
drug inhibition of kinases, as used in studies of protein signaling. This ap¬ 
plication illustrates the need to consider the biological mechanism of the in¬ 
tervention in selecting from the interventional formulations outlined above. 

Kinase inhibition blocks the kinase domain of the target, removing the 
ability of the target to enzymatically influence other nodes. However, such 
inhibitors may not prevent phosphorylation of the target itself. Therefore, 
we focus on “-out” interventions for modeling kinase inhibitors. These inter¬ 
vention models can be used in combination (see Figure 1) to reflect under¬ 
standing of the biological action of the interventions. The perfect and mech¬ 
anism change intervention models cannot be used together, as this would 
introduce a column of zeros into the design matrix. Perfect interventions in 
combination with fixed-effect interventions are well suited to modeling ki¬ 
nase inhibition using log-transformed data, since they capture the blocking 
of enzymatic ability and also allow estimation of the quantitative effect of 
inhibition on child nodes. 

Any extra parameters introduced by the intervention models are handled 
in exactly the same way as the existing regression coefficients denoted by (3 
in equation (3). The design matrix is augmented to include the effects 
of the interventions in the conditions when they are active and so once the 
parameters have been integrated out, the marginal likelihood takes the same 
form as before [equation (5)]. Regressions that include causal components 
can be described within the framework of Structural Equation Models [for 
a comprehensive discussion see Chapter 5 of Pearl (2000)]. Full technical 


NETWORK INFERENCE WITH INTERVENTIONS 


9 


No intervention 

Perfect 

Meclianisin cliange 

X^Y 

X^Y 

X^ll^Y 

Xi H X Y 

Xi H X Y 

Xi H X iLK-,- Y 

Fixed effect 

Perfect and 

Meclianisin cliange 


fixed effect 

and fixed effect 

X^Y 

X^Y 

X^Y 

Xi H X Y 

Xi H X +'* Y 

XiHX^^l^Y 


Fig. 1. Diagrammatic representation of intervention models in their “-out” forms, for 
variables X, Y with directed acyclic graph X —>■ Y; ‘XHX” denotes inhibition of variable 
X by an inhibitor Xi (see text for details). 

details of how to apply these interventions in practice are given in Spencer, 
Hill and Mukherjee (2015), along with a toy example illustrating their ap¬ 
plication. 

2.3. Protein data example. We now illustrate the foregoing approaches 
using a simple, real data example (Figure 2) in which a known three-node 
network is interrogated by inhibition (data courtesy Gray Lab, OHSU Knight 
Cancer Institute, Portland, OR, USA). Three phospho-proteins—the recep¬ 
tor EGFR, phosphorylated on tyrosine residue #1173 (“EGFRpY1173”), 
and two nodes downstream of EGER, namely, AKTpS473 and MAPKpT202— 
were observed through time under several experimental conditions. The con¬ 
ditions included the following: no inhibitors (green in Eigure 2), with an AKT 
inhibitor (blue), with an EGER inhibitor (red) and with both EGER and 
AKT inhibition (purple). In line with the known network, the data show a 
clear reduction in the observed level of AKTpS473 and MAPKpT202 under 
EGER inhibition. 

To investigate the behavior of the interventional schemes described above, 
we carried out network inference for these data using a CDBN with the re¬ 
spective intervention scheme (in their “out” forms). We show the data itself, 
posterior expected fitted values obtained via model averaging (hereafter ab¬ 
breviated to htted values) from the various models and the corresponding 
inferred networks. Strikingly, although several of the methods fit the data 
reasonably well, only fixed effect and perfect fixed effect are able to both fit 
the data and estimate what we believe to be the correct network. 

It is noteworthy that even in this simple example it is possible to fit the 
data well while estimating a plainly incorrect network. Eor example, the no 
intervention model fits the data (including the inhibitor time courses) rea¬ 
sonably well, but does not estimate the known edges from EGER to MARK 
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Fig. 2. Real data illustration of the behavior of interventional and noninterventional approaches. Data (circles; expression level vs time 
index) are from a breast cancer cell line AU565 for three proteins EGFRpY1173, AKTpSflS and MAPKpT202. The data were modeled 
using DBN (no intervention) and CDBN with mechanism change, perfect, fixed effect, and perfect with fixed effect interventions (the 
latter all in their “out” form). Fitted values are shown as lines; inferred networks are shown in the last row, with marginal posterior edge 
probabilities shown in the legends. “True network” indicates what we believe to be the correct causal graph (AKT and MAPK are known 
to be downstream of FGFR, these edges are verified by the interventional data shown here). 
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and AKT, despite the fact that both MAPK and AKT change dramatically 
under EGFR inhibition in the very data being analyzed. This is an example 
of statistical confounding that arises due to the fact that the data are ana¬ 
lyzed “blind”: the analysis does not know which time course was obtained 
under EGFR inhibition, rendering the easily seen causal effect of EGFR 
on AKT and MAPK invisible to network inference. In contrast, the fixed- 
effect intervention approaches can directly incorporate this information in 
the overall network inference. Note also that the inhibitors can be seen to 
affect the concentration of their target proteins, most likely due to feedback 
mechanisms that are represented by self-edges in the estimated network. For 
more discussion about the role of the self-edge in the network, see Section 4. 

3. Results. 

3.1. Simulation study. We performed a simulation study to compare the 
network inference methods with different intervention models. Data for 15 
nodes were simulated from a GDBN using a data-generating graph G* [see 
Figure S3 in Spencer, Hill and Mukherjee (2015)]. Mimicking the design of 
typical real proteomic experiments, for each protein we simulated a small 
number of time points (8) in four experimental conditions (no inhibitor; in¬ 
hibition of node “A”; inhibition of node “B”; inhibition of both node “A” 
and node “B”), giving re = 32 multivariate datapoints. We sampled coeffi¬ 
cients for the node-specihc linear models uniformly at random from the set 
(—1, —0.5) U (0.5,1) in order to create associations of various strengths, while 
avoiding associations close to zero. To simulate data under in silico inhibi¬ 
tion requires an intervention model: since each interventional scheme also 
corresponds to such a data-generating model, to avoid bias, we simulated 
data based on all four intervention models that were considered, namely, 
perfect, fixed effect, perfect with hxed effect and mechanism change (all 
in their “out” forms). Network inference was then carried out as described 
above using these four intervention models plus the model with no interven¬ 
tion and simple, marginal correlations between nodes (i.e., a co-expression 
network). 

Figure 3 shows ROG curves (produced from 20 data sets for each regime) 
for each combination of intervention method and the underlying model. 
These curves plot true positive rates (with respect to edges in the data- 
generating graph) against false positive rates across a range of thresholds 
on marginal posterior edge probabilities. In each case and as expected, anal¬ 
ysis under the data-generating model gives the best results. However, the 
perfect and fixed-effect model does consistently well and generally performs 
almost as well as inference using the data-generating model. The mecha¬ 
nism change model generally appears to perform similarly to the perfect 
intervention model. Co-expression analysis does much less well than all of 
the GDBN models. Note that even under the correct data-generating model, 
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True model: perfect 


True model: fixed effect 




True model: perfect & fixed effect 


True model: mechanism change 




Eig. 3. Simulated data results. Data were generated based on (a) perfect (b) fixed effect, 
(c) perfect and fixed effect and (d) mechanism change intervention models (all in their 
“out” form) and analyzed using CDBNs coupled to these four intervention models, plus a 
classical DBN (“no intervention”) and a baseline co-expression analysis (“correlations”; 
see text for details). Receiver Operating Characteristic (ROC) curves for each method in 
each data-generating regime are shown; the crosses correspond to the point estimate of the 
network obtained by thresholding marginal posterior edge probabilities at 1/2. 


in this noisy, small sample example, the area under the ROC curve can be 
much lower than unity, highlighting the inherent difficulty of the network 
inference problem and the challenging nature of the simulation. 

3.2. Cancer cell line data. 

3.2.1. Data. Phospho-protein time courses were obtained from two breast 
cancer cell lines (AU565 and BT474) for 48 proteins using reverse-phase pro- 
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tein arrays (data courtesy Gray Lab, Knight Cancer Center, OHSU, Port¬ 
land, OR; these data form part of a larger, ongoing study covering a broad 
panel of breast cancer cell lines and a larger set of proteins). Data comprised 
8 time points (0.5, 1, 2, 4, 8 and 24 hours following Serum stimulation) in 
4 experimental conditions: no inhibitor (DMSO); EGFR inhibition by La- 
patinib (“EGFRi”, at a dose of 250 nM);^ AKT inhibition by GSK690693 
(“AKTi”, at 250 nM); and inhibition by both EGFRi and AKTi (each at 
250 nM). This gave n = 32 datapoints for each protein. For further details 
concerning experimental protocol see Spencer, Hill and Mukherjee (2015). 

3.2.2. Cell line specific networks. The two cancer cell lines studied differ 
in terms of the genetic alterations that they harbor [Neve et al. (2006)] and 
may differ in terms of underlying signaling network topology. To avoid aggre¬ 
gating potentially heterogeneous data, we analyzed the cell lines separately 
to obtain cell line-specific networks. Figure 4 shows the inferred networks 
for the two cell lines. The edges highlighted in green are not inferred with 
the conventional DBN without interventions [the full networks inferred by 
the no intervention DBN are shown in Spencer, Hill and Mukherjee (2015)]. 
We discuss network validity below, but note that full validation of the cell 
line-specific networks requires further experimental work and is beyond the 
scope of this paper. 

3.2.3. Network validity. In this real data example, the true data- 
generating networks are not known. However, since the experimental de¬ 
sign includes interventions, the relevant data (EGFRi and AKTi) can be 
used to test the causal validity of the estimated networks downstream of the 
inhibited nodes. For example, suppose a node k changes under inhibition of 
AKT. This means that k is downstream of AKT; since the observation is 
made under external manipulation of AKT, we can say that k is a descen¬ 
dant of AKT in the underlying causal graph. Testing each node for change 
under AKTi (this was done using a paired t-test at the 5% level) gave a set 
Dakt of nodes downstream of AKT that could be compared against the 
corresponding set of descendants from the inferred networks. This was done 
in an ROC-sense in the following way for each cell line. First, we thresholded 
posterior edge probabilities at r to obtain a network Gr, which then gave an 
estimated set of descendants of AKT, Dakt{t). The number of true and false 
positives at threshold r are then |I)akt(t) PiZIaktI and \Dakt{t) \L>akt|) 
respectively. Varying threshold r then gave an ROC curve assessing ability 
to recover causal descendancy across the full range of thresholds. To ensure 
that our inference approach could uncover direct associations and that our 


^Lapatinib is a dual EGFR/HER2 inhibitor. 
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Fig. 4. Estimated networks for cell lines AU565 and BTflf. Data were analyzed using a CDBN with informative prior and perfect 
fixed effect out interventions; all edges with posterior probability greater than 0.5 are shown. Highlighted edges are not present when no 
intervention model is used. Proteins with no edges are not shown. 
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Fig. 5. Real data results. Receiver Operating Characteristic (ROC) curves showing agree¬ 
ment of estimated networks with changes observed under experimental intervention; the 
crosses correspond to the point estimate of the network obtained by thresholding marginal 
posterior edge probabilities at 1/2. Upper panels show results based on analysis of descen- 
dancy in estimated networks; lower panels show corresponding results using direct children 
in estimated networks (see text for details). Left panels compare CDBNs using various in¬ 
terventional models against each other; right panels compare selected CDBNs with several 
existing approaches (as detailed in text). 


conclusions did not depend solely on the form of this analysis, we also com¬ 
pared the set of significant downstream nodes H^akt with the inferred set of 
direct children of AKT for each posterior edge probability threshold. These 
variants of the first two ROC plots are shown in Figure 5(c) and (d). 

Figure 5(a) shows the ROC curves for each of the intervention approaches 
considered, combined across both inhibited proteins (EGFR and AKT) and 
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cell lines (AU565 and BT474). The fixed-effect approach has the highest 
ROC curve area (0.830), marginally ahead of perfect fixed effect (0.823), 
which has the posterior median network with the highest true positive rate. 
The perfect fixed-effect model is compared with other methods in the litera¬ 
ture in Figure 5(b), including DDEPN [Bender et al. (2010)] and a Gaussian 
process-based method due to Aijo and Lahdesmaki (2009), which does not 
explicitly model interventions. The influence of the prior network is also 
explored. 

Due to the limited size of the data set, it is not feasible to leave out the 
inhibitor data used to produce the ROC curves and still carry out network 
inference. The results shown should therefore be regarded as an assessment 
of “causal fit” rather than a validation of causal links. It is noteworthy that 
the existing approaches, including classical DBNs, are not able to correctly 
estimate the causal dependencies even for the interventions and protein levels 
that are present in the training data itself. 

4. Discussion. Recently, there has been interesting work on explicitly 
causal methods for networks, including linear [Maathuis, Kalisch and 
Biihlmann (2009)] and nonlinear [Oates and Mukherjee (2012)] models. In¬ 
terventional data are important for elucidation of causal links. However, 
standard DBNs are not appropriate for modeling interventional data. By 
modeling interventions explicitly we were able to extend DBNs in a causal 
direction. We discussed and illustrated the issues of confounding that can 
arise in network inference. As we showed using real data, nodes not linked 
in terms of regulation can nonetheless exhibit statistical association and 
thereby easily lead network inference astray. We showed how such confound¬ 
ing can present a concern even with only three nodes, but the issue becomes 
rapidly more severe in higher dimensions. 

The posterior edge probabilities that we report are not truly causal quan¬ 
tities. In principle, it could be possible to instead consider causal coefficients 
calculated via the do-calculus [as in (2009)]. However, since interventions 
in time-course experiments are in fact compound do operations (applying 
to multiple time points and therefore multiple nodes in the unrolled DAG), 
calculation of causal coefficients is more complicated than in the static DAG 
case, and we are not aware of a simple way to proceed in this setting, even for 
the linear models considered here. On the other hand, in contrast to static 
DAGs as in (2009), for feed-forward DBNs of the type considered here, the 
underlying DAG is identifiable (i.e., the equivalence class always contains 
exactly one graph). We showed empirically that the posterior edge probabil¬ 
ities we reported provide useful information on causal edges, but we do not 
currently fully understand the relationship between such measures [as used 
here and in most mainstream Bayesian approaches for biological network 
inference, including, among others, Husmeier (2003), Hill et al. (2012)] and 
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the corresponding causal coefficients, and further work in this area would be 
valuable. We reiterate that causal interpretation of CDBNs requires addi¬ 
tional assumptions that go beyond those needed to justify conditional inde¬ 
pendence statements. However, as noted by Dawid (2007) in the context of 
static DAGs, such assumptions are generally difficult, if not impossible, to 
check. Therefore, empirical validation of causal inference remains a crucial 
direction for future work. 

Results on simulated data suggested that the “perfect-hxed-effect-out” 
intervention scheme we proposed represents a good default choice for kinase 
inhibition experiments. We conjecture that “perfect-fixed-effect-in” inter¬ 
ventions may represent a good default approach for analysis of gene expres¬ 
sion time-course data obtained under knockouts and RNAi knockdowns, 
but we did not explore such data here. We recommend that an intervention 
model should be chosen in line with the mechanism of the intervention un¬ 
der consideration. In situations were biochemical knowledge is insufficient, 
it may be possible to treat the choice of interventional regime as a model 
selection problem, but we did not explore this possibility here. 

The networks shown in Figure 4 reflect several features that are typi¬ 
cal of protein signaling, including a cascade-type structure originating from 
the receptor EGFR. The edges highlighted in green show the changes in 
the network that are induced by modeling the interventions, and the im¬ 
provement in the ROC curve in Figure 5(a) suggests that using the per¬ 
fect hxed-effect intervention model has produced a more accurate network, 
particularly around the inhibited proteins. Since these edges (in green) are 
inferred only when the inhibition is taken into account, they may be more 
likely to reflect causal information. There are more differences between the 
two cell lines than might have been expected. These differences may be real 
or may be due to some of the many limitations inherent to biological network 
analyses, including experimental caveats and limitations of the inference ap¬ 
proach and causal models. However, experimental validation of the networks 
is beyond the scope of this paper. 

The ROC curves in Figure 5 show the perfect fixed-effect model performs 
better than several other approaches. However, the poor performance of the 
no intervention DBN model (which is identical except for the modeling of 
interventions) demonstrates that this success is not based on the network 
inference scheme or the prior, but on the appropriate handling of inter¬ 
ventions. Surprisingly, the Gaussian process method performs better overall 
than DDEPN, even though the latter models interventions. This may be due 
to the fact that the inference is conducted over a relatively large network 
(48 proteins) and DDEPN suggests a very small set of potential edges. 

The self-edge (the edge that connects a protein to itself) has two roles in 
the model. First, it can represent statistical autocorrelation in the protein 
time course. Second, it can represent a (negative or positive) feedback loop. 
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possibly via some additional unmeasured variables. Since we integrate out 
the regression coefficient to obtain the marginal likelihood, the posterior 
signaling network does not give any indication of the sign of any feedback, 
nor the role of the self-edge. In future work we hope to differentiate between 
inhibition and activation effects in the signaling network, helping to clarify 
the role of the self-edges. 
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SUPPLEMENTARY MATERIAL 

Supplement to “Inferring network structure from interventional time- 
course experiments” (DOI: 10.1214/15-AOAS806SUPP; .pdf). Additional 
technical information about orthogonalization, the experimental procedure 
and the intervention models, including a toy example. Supplementary figures 
showing the prior network, the “true” network used for simulations and the 
posterior signaling networks without interventions. 
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